Performance Analysis for Parallel Generalized Association Rule Mining on a Large Scale PC Cluster

نویسندگان

  • Takahiko Shintani
  • Masato Oguchi
  • Masaru Kitsuregawa
چکیده

One of the most important problems in data mining is discovery of association rules in large database. We had proposed parallel algorithms for mining generalized association rules with classi cation hierarchy. In this paper, we implemented the proposed algorithms on a large scale PC cluster which consists of one hundred PCs interconnected by an ATM switch, and analyzed the performance of our algorithms using a large amount of transaction dataset. Performance evaluations show our parallel algorithms are e ective for handling skew for such large scale parallel systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Data Mining on ATM-Connected PC Cluster and Optimization of Its Execution Environments

In this paper, we have constructed a large scale ATM-connected PC cluster consists of 100 PCs, implemented a data mining application, and optimized its execution environment. Default parameters of TCP retransmission mechanism cannot provide good performance for data mining application, since a lot of collisions occur in the case of all-to-all multicasting in the large scale PC cluster. Using a ...

متن کامل

Parallel SQL Based Association Rule Mining on Large Scale PC Cluster: Performance Comparison with Directly Coded C Implementation

Data mining is becoming increasingly important since the size of databases grows even larger and the need to explore hidden rules from the databases becomes widely recognized. Currently database systems are dominated by relational database and the ability to perform data mining using standard SQL queries will de nitely ease implementation of data mining. However the performance of SQL based dat...

متن کامل

Preliminary Experimental Results of a Parallel Association Rule Mining on ATM Connected PC Clusters

Until recently, workstations were overwhelmingly superior to personal computers in terms of performance. However, recent PC technology has dramatically increased its CPU, main memory, and cache memory performance. Therefore massively parallel computer systems are moving away from proprietary components such as CPU, disks, etc. to commodity parts. As far as applications are concerned, we believe...

متن کامل

Optimizing Protocol Parameters to Large Scale PC Cluster and Evaluation of its Effectiveness with Parallel Data Mining

Recently, PC clusters have come to be studied intensively, for a large scale parallel computer in the next generation. ATM technology is a strong candidate as a de facto standard of high speed communication networks. Therefore an ATM connected PC cluster is very promising platform from the cost/performance point of view, as a future high performance computing environment. In this paper, an ATM ...

متن کامل

Performance Evaluation of the Distributed Association Rule Mining Algorithms

One of the best-known problems in data mining is association rule mining. It requires very large computation and I/O traffic capacity, therefore several distributed and parallel association rule mining algorithms have been developed. However the association rule mining problem is NP complete, the execution time estimation of the algorithms can be very important, especially for load balancing or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999